Use internal Url struct in favor of url::Url to minimize the url dep in payjoin#1377
Use internal Url struct in favor of url::Url to minimize the url dep in payjoin#1377benalleng wants to merge 4 commits intopayjoin:masterfrom
Conversation
Coverage Report for CI Build 24522598976Coverage increased (+0.4%) to 84.741%Details
Uncovered Changes
Coverage Regressions3 previously-covered lines in 2 files lost coverage.
Coverage Stats
💛 - Coveralls |
This comment was marked as resolved.
This comment was marked as resolved.
7c97b6c to
8e51345
Compare
DanGould
left a comment
There was a problem hiding this comment.
We can edit the spec to say "the host component of any URL in a BIP77 message MUST be an ASCII LDH (Letters, Digits, Hyphens) plus dots: [a-zA-Z0-9.-] hostname or an IPv4 address literal" to kill IDNA once and for all. For the full URL, we restrict the character set to RFC 3986 unreserved plus the structural delimiters we actually need (:/, ?, #, @). Reject any byte outside printable ASCII (0x21–0x7E) that isn't percent-encoded.
There's a specific URL fuzzer we might consider here: https://github.com/orangetw/Tiny-URL-Fuzzer https://github.com/orangetw/Tiny-URL-Fuzzer/blob/master/samples.txt could also test against
https://payjo.in\@evil.com/path
https://payjo.in%40evil.com/path
https://payjo.in#\@evil.com
https://payjo.in/../evil.com
https://payjo.in/%2e%2e/evil.com
https://evil.com:443@payjo.in/
https://payjo.in%00.evil.com/path
https://payjo.in:@evil.com/
not that these are reall attacks but just so we knwo we parse the same as url for stuff we could actually encounter. Low prio but I wanted to document.
| } | ||
|
|
||
| #[derive(Debug, Clone, PartialEq, Eq)] | ||
| #[allow(dead_code)] |
There was a problem hiding this comment.
note to self that this is removed by the time the PR's last commit comes around.
There was a problem hiding this comment.
Do we want support for ipv4 or ipv6 hosts in payjoin::Url if not we can just keep Host::Domain.
There was a problem hiding this comment.
I think for testing especially this is still useful. What do you think?
|
Tnull says "Def. no huge blocker if there is a path to dropping it!" but this seems pretty darn close so I'd like to close it out to keep the prs flowing |
652f4a7 to
4fca182
Compare
96f3353 to
b3bcf75
Compare
6efecbe to
04d22c6
Compare
This commit migrates the monorepo away from the external Url dep to use the new internal Url. Additionally due to the transition we need to add a dep for url encoding with `percent-encoding-rfc3986` which coincidentally get us inline with the bitcoin_uri crate.
xstoicunicornx
left a comment
There was a problem hiding this comment.
Have a bit of feedback.
One other thing, seems like we are replicating the url::form_urlencoded::parse method in two different places so maybe its worth creating a native implementation of this in url.rs?
|
|
||
| pub fn query(&self) -> Option<&str> { self.query.as_deref() } | ||
|
|
||
| pub fn set_query(&mut self, query: Option<&str>) { |
There was a problem hiding this comment.
Since this is a public fn it feels a little weird to have no validation done. Maybe just have a clear_query to easily set query to None and otherwise force usage of query_pairs_mut? Or reduce visibility to pub(crate)?
|
|
||
| pub fn join(&self, segment: &str) -> Result<Url, ParseError> { | ||
| // If the segment is a full URL (scheme://...), parse it independently. | ||
| // Only treat it as a full URL if :// appears before any / (i.e. in scheme position). |
There was a problem hiding this comment.
Nit: this comment was confusing to me when I originally read it
| // Only treat it as a full URL if :// appears before any / (i.e. in scheme position). | |
| // Only treat it as a full URL if no / appears before :// (i.e. in scheme position). |
| scheme.push(c); | ||
| } | ||
| ':' => break, | ||
| _ => return Err(ParseError::InvalidCharacter), |
There was a problem hiding this comment.
Maybe just use ParseError::InvalidScheme instead, as this is more consistent with how parse_port errors are handled
| let mut path = String::new(); | ||
| let mut query: Option<String> = None; | ||
| let mut fragment: Option<String> = None; | ||
|
|
||
| if let Some(frag_pos) = input.find('#') { | ||
| let before_fragment = &input[..frag_pos]; | ||
| fragment = Some(input[frag_pos + 1..].to_string()); | ||
|
|
||
| if let Some(q_pos) = before_fragment.find('?') { | ||
| path.push_str(&before_fragment[..q_pos]); | ||
| query = Some(before_fragment[q_pos + 1..].to_string()); | ||
| } else { | ||
| path.push_str(before_fragment); | ||
| } | ||
| } else if let Some(q_pos) = input.find('?') { | ||
| path.push_str(&input[..q_pos]); | ||
| query = Some(input[q_pos + 1..].to_string()); | ||
| } else { | ||
| path.push_str(input); | ||
| } |
There was a problem hiding this comment.
This is a bit confusing to follow why not just:
| let mut path = String::new(); | |
| let mut query: Option<String> = None; | |
| let mut fragment: Option<String> = None; | |
| if let Some(frag_pos) = input.find('#') { | |
| let before_fragment = &input[..frag_pos]; | |
| fragment = Some(input[frag_pos + 1..].to_string()); | |
| if let Some(q_pos) = before_fragment.find('?') { | |
| path.push_str(&before_fragment[..q_pos]); | |
| query = Some(before_fragment[q_pos + 1..].to_string()); | |
| } else { | |
| path.push_str(before_fragment); | |
| } | |
| } else if let Some(q_pos) = input.find('?') { | |
| path.push_str(&input[..q_pos]); | |
| query = Some(input[q_pos + 1..].to_string()); | |
| } else { | |
| path.push_str(input); | |
| } | |
| let (before_fragment, fragment) = match input.find('#') { | |
| Some(pos) => (&input[..pos], Some(input[pos + 1..].to_string())), | |
| None => (&input[..], None), | |
| }; | |
| let (path, query) = match before_fragment.find('?') { | |
| Some(pos) => | |
| (before_fragment[..pos].to_string(), Some(before_fragment[pos + 1..].to_string())), | |
| None => (before_fragment.to_string(), None), | |
| }; |
| Some(ref h) if h.is_empty() => return Err(ParseError::EmptyHost), | ||
| Some(Host::Domain(ref d)) | ||
| if !d.chars().all(|c| c.is_ascii_alphanumeric() || c == '-' || c == '.') => | ||
| return Err(ParseError::InvalidHost), |
There was a problem hiding this comment.
Wouldn't it make more sense for these parsing errors to be caught in parse_host (would also be more consistent with the other parsing functions)?
Also, if we are erroring for ParseError::EmptyHost why does the host need to be an Option<Host> rather than just Host? This validation seems to ensure that the host will always exist. Not using an Option<Host> would also eliminate the need for has_host.
Closes #1124
This adds the internal Url Struct to no longer depend on the url crate directly for it.
However we still have reqwest inside payjoin url from payjoin-directory and test-utils which pull in the url dep, but that is only available on the
iofeature.I have also added a fuzz target, which already helped me better shape the parse function better.
I have only ran the fuzzer for a limited amount of time however and some more cpu hours might be helpful
Claude really helped me get through the brunt of this.
You can test the lack of an accessible url dep not including the io feature with this command.
Pull Request Checklist
Please confirm the following before requesting review:
AI
in the body of this PR.